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Abstract: We explore in this paper if automated scaffolding delivered via a 
pedagogical agent within a simulation can help students acquire data collection 
inquiry skills. Our initial analyses revealed that such scaffolding was effective for 
helping students who initially did not know two specific skills, designing controlled 
experiments and testing stated hypotheses, acquire those skills. These results provide 
evidence towards realizing rigorous, scalable, performance-based assessment of 
scientific inquiry skills and the efficacy of an automated scaffolding approach. 


Introduction 

Science educators and reseachers agree that cultivating inquiry and critical thinking skills are 
necessary for students to become scientifically literate [1], [2], [3], [4] and to be well-poised 
for the demands of the knowledge-based economy of the 21st century [5]. However, learning 
inquiry skills is challenging for students (e.g. [6], [7], [8], [9], [10]). These challenges can 
lead to many false starts [11], misconceptions [12], and failure to learn targeted science prin- 
ciples [13]. Given the importance of inquiry and these challenges, it is important to 
understand how best to foster learning of these skills (cf. [13], [14]). 

Relevant to this paper, we consider skills at designing and conducting experiments 
(cf. [1]). Some studies showed that explicitly teaching strategies like controlling for variables 
[15] can lead to acquisition [16], [17], [18], [19], retention [18], [20], and transfer [16], [17], 
[18] of the strategy over pure discovery methods. Others showed that long-term, repeated 
practice also promotes skill acquisition, without instruction [21], [22], [23]. Scaffolding- 
based approaches, on the other hand, may strike a balance between these extremes by provid- 
ing help only when students need it (cf. [24], [25]). For example, providing structure (scaf- 
folding) during open-ended inquiry activities can foster learning (e.g. [26]). Similarly, indi- 
vidualized, real-time feedback may also help students learn inquiry skills [27]. 

Towards promoting the learning of data collection inquiry skills, we determine if scaf- 
folding can help students acquire two such skills [1]: testing one’s stated hypothesis, and de- 
signing controlled experiments [28], [29]. This work is conducted within the context of a 
web-based inquiry learning environment, called Ing-ITS', in which students conduct inquiry 
using simulations [29]. The learning environment was augmented to provide real-time, auto- 
mated scaffolding as students collect data with the simulation. This scaffolding is driven, in 
part, by data-mined models that determine when students are off-track [30], [28]. A random- 
ized, controlled experiment was conducted with two groups of students, those who received 
data collection scaffolds and those who did not, to determine if scaffolding impacted data 
collection skill acquisition in one set of inquiry activities on phase change, a middle school 
physical science topic. We hypothesize that our automated scaffolding approach that provides 
just-in-time feedback can improve students’ data collection skills. 
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Ing-ITS 

Ing-ITS (Inquiry Intelligent Tutoring System) is a web-based, virtual science lab environ- 
ment that aims to automatically assess and provide students personalized feedback on their 
inquiry and critical thinking skills [29]. In this environment, students conduct inquiry by 
forming hypotheses, collecting data, analyzing data, and communicating findings using inter- 
active simulations. The simulations provide a focal point around which students conduct their 
inquiry, and are designed to address concepts and misconceptions aligned to the NGSS 
standards [2]. Currently, simulations have been developed for middle school Physical, Life 
and Earth Science. In addition to simulations, students also utilize inquiry support tools, such 
as a hypothesizing tool and analyzing data tool. The tools help students conduct inquiry, keep 
track of their progress, and enable assessment by making their thinking explicit [29]. 

Inq-ITS is similar to other microworld/simulation-based discovery environments (e.g. 
[31], [32], [33], [34]) in that the computer-based activities structure students’ exploration and 
share a goal in bootstrapping the acquisition of content knowledge. In particular, like [35], it 
also emphasizes performance-based assessment of inquiry skills. Also, like [36], the system 
aims to provide real-time feedback to students as they work. Inq-ITS specifically aims to as- 
sess and provide real-time feedback on skills identified by national and state frameworks [1] 
like hypothesizing, designing and conducting experiments, interpreting data, and communi- 
cating findings. Thus, the system aims to provide students with supports so they do not 
flounder or engage in unproductive, haphazard inquiry behaviors [10], [37]. This approach is 
commensurate with a conceptualization of science inquiry described by Kuhn et al. [38], 
p.497, “students investigate a set of phenomena — virtual or real — and draw conclusions 
about the phenomena.” 

Each Inq-ITS activity is a performance assessment; the actions students take within 
the environment and work products they create are the bases for assessment. Towards the 
goal of real-time formative assessment, the system provides automatic feedback both to 
students and educators as students engage in the inquiry activities. For educators, the system 
automatically generates formative metrics and summary reports on the development of these 
skills. Educators can pinpoint which students are having difficulty and on what specific in- 
quiry skills. For students, a pedagogical agent gives immediate feedback on their work prod- 
ucts and experimentation processes to support them in improving their inquiry skills. To con- 
cretize these notions, we describe inquiry activities for Phase Change, a physical science top- 
ic that is the focus of this study, and describe how the pedagogical agent provides feedback 
on students’ data collection processes. 

The Phase Change activities [29] seek to foster understanding about the melting and 
boiling properties of water. In these activities (like all Inq-ITS activities), students are first 
given an initial goal around which their inquiry is to be conducted. In phase change, a goal 
would be to determine if one of three factors (size of a container, amount of ice to melt, and 
amount of heat applied to the ice) affects various outcomes (e.g. melting or boiling point). 
They then engage in a semi-structured scientific inquiry to address the goal as follows. First, 
they articulate a hypothesis to be tested using a hypothesis widget (Figure 1). The widget 
supports students in forming a testable hypothesis. Next, students collect data to test their 
hypothesis (Figure 2) with the Phase Change simulation. Students change the simulation’s 
variables, and then run, pause, and reset it to collect their data. A data table tool auto- 
populates and shows the data they collected thus far. Once finished, they analyze their data 
(Figure 3) by forming an argument and selecting trials to indicate whether their hypotheses 
were supported or refuted based on the data they collected. Thus, in each activity, students 


hypothesize, collect and interpret data, and warrant their claims to address the goal. More 
information about the learning environment and support tools is described in [29]. 

This work centers on assessing and scaffolding two skills associated with productive 
data collection, designing controlled experiments and collecting data to test one’s stated hy- 
potheses [28], [29]. Briefly, students design controlled experiments when they generate data 
that make it possible to infer how changeable factors affect outcomes. This skill relates to the 
application of the Control of Variables Strategy (CVS; cf. [15]), but unlike CVS, it takes into 
consideration al/ the experimental design setups run with the simulation, not just isolated, 
sequential pairs of trials [28], [39]. Students test their stated hypotheses when they collect 
data that can support or refute an explicitly stated hypothesis. These skills are separated for 
two reasons. First, each skill can be demonstrated separately as students collect data. Students 
may attempt to test their hypotheses with confounded designs, or may design controlled ex- 
periments for a hypothesis not explicitly stated. As will be described below, this also enables 
the system to provide different scaffolds based on whether students have difficulty with either 
(or both) skills. Second, skill at testing hypotheses may be indicative of students’ successful 
planning and monitoring of their inquiry [40]. 

Since these are process skills, assessment is based on students’ interactions with the 
simulation, like running trials and changing the simulation’s variable values (Figure 2), while 
collecting data. Next, we describe how the system provides automated feedback to help stu- 
dents learn these skills. 
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Figure 1. In a Phase Change inquiry activity, students first attempt to construct a hypothesis 
they can test using the hypothesis widget. 


Goal: Determine how one variable you choose affects the boiling point of ice 


EXPERIMENT: Collect data to help you test your hypothesis. ... more 
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Figure 2. After hypothesizing, students collect data by designing and running experiments 
with the Phase Change simulation. Here, the pedagogical agent Rex responds to a student 
who appears to be designing controlled experiments, but is not testing their hypothesis. They 
can continue experimenting or ask Rex for more help, in this case by clicking “How do I do 
that?” 
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Figure 3. After collecting data, students then determine if their hypothesis was supported or 
not by analyzing their data. Here, students construct an argument using the analysis widget 
pulldown menus and select the trials they use as evidence to warrant their claim. 


Real-Time Scaffolding and Evaluation of Data Collection Skills 

Inq-ITS delivers scaffolds and hints to students via a pedagogical agent named Rex, a cartoon 
dinosaur (Figure 2). Rex provides feedback as students experiment when he detects they are 
off track. If a student continues to struggle, more targeted feedback is provided, similar to 
Cognitive Tutors (e.g. [41], [42], [43]). For example, if Rex detects that a student is designing 
controlled experiments but not collecting data to test their hypothesis, Rex will say “It looks 
like you did great at designing a controlled experiment, but let me remind you to collect data 
to help your test your hypotheses.” If the student continues struggling, “bottom-out’” feedback 
is given (cf. [42]): “Let me help some more. Just change the [IV] and run another trial. Don't 
change the other variables. Doing this lets you tell for sure if changing the [IV] causes chang- 
es to the [DV]”. While collecting data, students may exhibit skill at testing their hypotheses, 
designing controlled experiments, both, or neither. To account for these possibilities, we im- 
plemented different scaffolding levels for each case in which the skills were not demonstrat- 
ed. Finally, we also provided on-demand help [44] students can activate on their own to ask 
Rex for more clarification (e.g. clicking “How do I do that?” for the scaffold presented in 
Figure 2). The full hierarchy of scaffolds for data collection used in this study are provided in 
the Appendix. 

The proactive, automated feedback approach was chosen for two reasons. First, prior 
work suggests that in general, students have difficulty engaging in product inquiry without 
support [11], [40], [10]. Second, students may lack the metacognitive help-seeking skills to 
recognize when they should ask for help [45], [44]. Reacting when students appear to be off- 
track, therefore, may be beneficial [46], [47]. A pedagogical agent was chosen specifically, 
because they have been shown to benefit learners (cf. [48], [49]), possibly by increasing 
students’ engagement and motivation [50], [51], [52], [53], [54]. 

The core of this approach hinges on the system’s ability to evaluate students’ experi- 
mentation patterns to determine when a student demonstrates good data collection skills, and 
subsequently intervene with automated scaffolding when they are not. The assessment chal- 
lenge is that these process skills are ill-defined; students’ data collection patterns can vary 
widely and there are many ways to successfully demonstrate (or not demonstrate) them [55]. 
In our approach, we built and validated data-mined detectors” to evaluate students’ data col- 
lection [28], [30], [56], [57]. This approach was chosen for two reasons. First, the approach 
attempts to overcome limitations of other models that either under- or over-estimate students’ 
skill [55]. Data mining attempts to account for “corner” cases when students do not conduct 
their inquiry in lock-step, unlike other approaches that make this assumption (e.g. [27], [33]). 
Again, this is particularly important since there is variability and ways in which students in- 
terleave behaviors and strategies when designing experiments [55]. Second, the data mining 
approach enables easier validation of how well the model performs for new student interac- 
tions and new activities by testing it against data not used to build it, addressing issues of re- 
liability and scalability in performance-based assessments (see [55], [57] for a discussion). 

The detectors aim to replicate a human expert’s ability to look at a student’s log file 
and determine whether or not they designed controlled experiments and tested their stated 
hypotheses. Thus, to train and test the detectors, labels of students’ log files were generated 
using text replay tagging [58], [28]. In this process, a human expert looks at “‘pretty-printed” 
versions of log files and labels whether the student designed controlled experiments and/or 
tested their stated hypotheses. From there, a feature set—indicators of whether or not students 
demonstrate skills computed over the log files—was derived. Example features considered 


? In our prior work, we identified a situation in which the detector for designing controlled 
experiments incorrectly identifies skill demonstration. For this situation only, we authored a 
rule to evaluate experimentation [30]. 


from the full set outlined in [59], [30] include: number of trials run, number of hypotheses 
stated, count of pairwise controlled trials, time spent running experiments, and number of 
simulation pauses. The detector, originally built for Phase Change, was further refined by 
choosing features that increased the theoretical construct validity of the detector, and by itera- 
tively refining it to find an optimal feature set [30], [57]. Full details about the detector con- 
struction process and its features can be found in [30], [57]. 

We also conducted extensive validation tests to show the applicability of these detec- 
tors across our physical science activities to identify the skills and determine when students 
are off-track. In the context of Phase Change, for example, they have been shown to ade- 
quately identify skill when students complete their experimentation [30]. They also could be 
used, as is, to detect when students are off-track and thus can be used to drive scaffolding 
before students complete their data collection [30]. Finally, they could be applied to new stu- 
dents [56] and to detect skill demonstration in other physical science topics [56], [57], and a 
life science topic with a complex systems simulation [60]. 

In this work, we aim to determine whether the data collection scaffolds are effective 
at helping students acquire the two data collection skills in the context of Phase Change, a 
physical science Inq-ITS activity set. The detectors are leveraged both to determine who 
should receive scaffolding and to evaluate who acquired the skills. 


Method 


Participants 

Participants were 299 eighth grade students from three schools in Central Massachusetts. 
Some had prior experience conducting inquiry within Inq-ITS, and for others, this was their 
first experience. 


Procedure 

Five inquiry-based activities were developed for Phase Change. Three targeted specific Phase 
Change concepts and two had students test their own hypotheses, subject to the factors they 
could vary with the simulation. For the first four activities, students practiced inquiry in one 
of two learning conditions, randomly chosen by the system: 

e Data collection scaffolding (DCS) condition: Scaffolds for writing a testable hypothe- 
sis (Figure 1) and for collecting data (Figure 2) were given to students if the system 
detected they were off-track. Scaffolds for analyzing data were not provided. 

e No data collection (NoDCS) scaffolding condition: Only scaffolds for writing a testa- 
ble hypothesis were present. 


We highlight that students in the “No Data Collection (NoDCS) scaffolding condi- 
tion” received scaffolds on constructing a testable hypothesis with the hypothesis widget 
(Figure 1), but not how to collect data to test that hypothesis. Furthermore, neither condition 
received scaffolds on analyzing data. This experimental design was chosen for two reasons. 
First, the design guarantees that students formulate a syntactically correct, testable hypothesis 
when they enter the experiment phase. Second, the design ensures that the efficacy of the data 
collection scaffolds is tested in isolation. For example, if the system provides feedback during 
data analysis to collect more data (e.g. all the student’s trials are confounded, preventing a 
good data analysis), the feedback could affect their performance at data collection. This 
would prevent us from disentangling the impacts of data collection scaffolds from analyzing 
data scaffolds. 

Finally, both groups completed an “immediate acquisition test”, a fifth Phase Change 
activity with no scaffolding for hypothesizing, data collecting, or analyzing data. This ena- 


bled measuring the impacts of scaffolding on skill acquisition when the scaffolds were re- 
moved. 


Results 

We aim to determine the efficacy of our automated scaffolding approach for helping students 
acquire two data collection inquiry skills, designing controlled experiments and testing one’s 
stated hypotheses. Efficacy is determined by evaluating if students who received scaffolding 
(DCS condition) were more likely to demonstrate the skills in the final, completely unscaf- 
folded (fifth) Phase Change activity, than those who did not receive scaffolding (NoDCS). As 
mentioned, the detectors evaluate whether or not students demonstrated the two skills. 

In this analysis, we consider only students who did not demonstrate either skill in 
their first data collection opportunity for two reasons. First, this approach accounts for stu- 
dents in either condition who may already know both skills. Second, it accounts for students 
in the DCD condition who may never have received scaffolding because they already knew 
the skills. This enables a more rigorous test of the efficacy of the scaffolding approach. From 
the original set of 268 students, 123 students who did not design controlled experiments in 
their first data collection, and 95 who students did not test their stated hypotheses were used 
in the analyses. 

As shown in Table 1, the scaffolding approach appeared to help students who initially 
did not know the skills acquire those skills. More specifically, 92.9% of students who did not 
initially design controlled experiments in the DCS condition did so in the unscaffolded Phase 
Change activity compared 58.5% of the students in the NoDCS condition, 77(1) = 20.79, p < 
001. In addition, 91.7% students in the DCS condition tested their stated hypotheses com- 
pared to 53.2% of the students in the NoDCS condition, 77(1) = 17.69, p < .001. The implica- 
tions of these findings are discussed next. 


Table 1. Crosstabulations of practice condition, and whether students demonstrated skill in 
the immediate transfer test, a completely unscaffolded Phase Change inquiry activity, n = 123 
for designing controlled experiments, and n = 95 for testing stated hypotheses. The detectors 
were used here to evaluate whether students designed controlled experiments and/or tested 
their stated hypotheses. Students considered in this analysis originally did not demonstrate 
skill in their first attempt at conducting inquiry. 


pelare . sa ae Tested Stated Hypotheses? 
Experiments? 
No Yes No Yes 
No DC Scaff. 22 31 22 25 
DC Scaff. 5 65 4 44 


(1) = 20.79% (1) = 17.69%** 


Discussion and Conclusions 

In this study, we extended our inquiry environment, Inq-ITS, a computer-based environment 
that can automatically assess students’ scientific inquiry skills [29], to incorporate automated, 
real-time scaffolding. We explored whether scaffolding would help students acquire and 
transfer two data collection skills, designing controlled experiments and testing stated hy- 
potheses, in the context of one set of inquiry activities on Phase Change, a middle school 
physical science topic. The real-time scaffolding was driven, in part, by data-mined detectors 
of these skills that could determine when students were haphazard in their data collection 
[30]. Overall, we found that our scaffolding approach was effective in helping students ac- 
quire these skills by comparing two groups of students in a randomized controlled study, 
those who received scaffolding and those who did not. 

This work makes three contributions to the literature on inquiry learning and on 
providing interventions using data-mined detectors. First, these findings are particularly 
promising as an approach to simultaneously assess and support inquiry skill development in a 
scalable way because the approach is entirely computer-based. Thus, this system has the po- 
tential to be implemented readily in a classroom setting, or as virtual homework, and can 
provide individualized support to students who need it. Second, though other successful in- 
terventions have been developed that leverage data mining-based detectors (e.g. [61], [62]), 
this is the first system to our knowledge that evaluates students’ skills, particularly inquiry 
processes (the actions they take while collect data), and uses that information to provide im- 
mediate feedback. Third, we showed that a “middle ground” between direct instruction and 
discovery learning [13], [14] has the potential to enable acquisition of these skills. The ap- 
proach employs Vygotsky’s original notion of scaffolding [24], prescribing that the scaffold 
be removed once the skill has been internalized by the student [63]. 

There are some limitations to this study. First, we did not fully address whether this 
environment could be used to teach the data collection skills to students who do not know 
these skills (cf. [64]). In general, we envision our learning environment to be an assessment 
platform that provides students just-in-time help, not as a pure instructional tool. In other 
words, we expect this tool to be used as an environment to hone inquiry skills that provides 
scaffolds as needed during practice (cf. [41], [65], [66]), after students are exposed to these 
inquiry topics in their regular curriculum [29]. Second, the evidence of acquisition and 
transfer is rooted primarily in procedural demonstration of the two data collection skills, and 
does not tap changes in conceptual / metastrategic knowledge of when and why one should 
apply them (e.g. [3], [67], [17]). One possible way to address this is to have students explain 
why they chose to design the experiments they did and code the open responses for evidence 
of such understanding (e.g. [22]). This approach, however, would be difficult to scale. 
Another possibility is to triangulate students’ performance in the “analyze data” task in which 
students make inference about the data they collected [29] with their performance in the 
“experiment” task. If students are able to successfully warrant their interpretations relative to 
their hypotheses by identifying which data enabled them to make inferences, this would be 
evidence of conceptual understanding of the data collection skills. 

Finally, though our results are encouraging, we recognize that the acquisition test was 
in the same science topic as that in which students practiced their inquiry [68]. In related 
work, we have shown that this scaffolding approach also helped students transfer these skills 
to a second physical science topic, also with a similar structure [59], [55], [69]. Our line of 
research will continue conducting similar studies using activities from dissimilar domains 
with different activity structures, like Life and Earth Science [29], to tease apart these possi- 
ble effects and determine if scaffolding enables broad transfer. In addition, we will develop 
scaffolds for other inquiry skills (e.g. analyzing data skills) and determine how these scaf- 


folds can promote skill acquisition and transfer in a scalable manner using our web-based 
system. 
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Appendix 


Pedagogical agent Rex’s messages given when the system detects that students are not de- 
signing controlled experiments, testing their stated hypotheses, or engaging in haphazard in- 


quiry. 
Constraint | Triggered Scaffold Message | Help Button | Help Response 
Level |: How do Ido | Design a controlled experiment 
I think the data you're col- that? by changing only the variable 
lecting won't help you test you are testing while keeping 
your hypothesis because you all the other variables the 
aren't designing a con- same. 
trolled experiment. Which vari- | It's in your hypothesis. It says 
able am I you want to test if changing the 
trying to [IV] affects the [DV]. 
test? 
Ineed more | Run trials where you: 
help (1) Change only the [IV], and 
(2) Keep all the other variables 
the same. 
Why do Changing only the [IV] while 
this? keeping everything else the 
same lets you tell for sure if the 
[IV] affects the [DV]. 
Level 2: Why do Changing only the [IV] and 
Let me help you some more. | this? keeping everything else the 


Not designing controlled experiments and not testing stated hypotheses 


You said you wanted to test 
if changing the [IV] affects 
the [DV] in your hypothesis. 


To do this, run pairs of trials 
where you: 

(1) Change only the [IV], 
and 

(2) Keep all the other varia- 
bles the same. 


same lets you tell for sure if the 
[IV] affects the [DV]. 


Let me help you some more. 


Just change the [IV] and run 
another trial. Don't change 
the other variables. 


Doing this lets you tell for 
sure if changing the [IV] 
affects the [DV]. 


Constraint 


Triggered Scaffold Message 


Help Button 


Help Response 


Designing controlled experiments, but not testing stated hypotheses 


Level 1: 

It looks like you did great at 
designing a controlled ex- 
periment, but let me remind 
you to collect data to help 
you test your hypothesis. 


How do Ido | Keep designing a controlled ex- 

that? periment, but make sure to try 
different values of variable 
you're trying to test. 

Which vari- | Your hypothesis says you want- 


able is that? 


ed to test if changing the [IV] 
affects the [DV]. 


Why do Changing the [IV] while keep- 
this? ing everything else the same lets 
you see how changing the [IV] 
affects the [DV]. 
Level 2: Why do Changing the [IV] while keep- 
Let me help again. this? ing everything else the same lets 


You said you wanted to test 
if changing the [IV] affects 
the [DV] in your hypothesis. 


Keep designing controlled 
experiments, and collect 
data for different values of 
the IV. 


you see how changing the [IV] 
affects the [DV]. 


Level 3: 
Let me help some more. 


Just change the [IV] and run 
another trial. Don't change 
the other variables. 


Doing this lets you tell for 
sure if changing the [IV] 
causes changes to the [DV]. 


Constraint | Triggered Scaffold Message | Help Button | Help Response 
Level |: How do I do | Design a controlled experiment 
I see you're collecting data that? by changing only the variable 
about the [IV], but you can't you are testing while keeping 
test your hypothesis because all the other variables the 
you aren't designing a con- same. 
trolled experiment. Which vari- | It's in your hypothesis. It says 
able am I you want to test if changing the 
trying to [IV] affects the [DV]. 
test? 
Ineed more | Run trials where you: 
help (1) Change only the [IV], and 
(2) Keep all the other variables 
the same. 
Why do Changing only the [IV] while 
this? keeping everything else the 
same lets you tell for sure if the 
[IV] affects the [DV]. 
Level 2: Why do Changing only the [IV] and 
Let me help again. this? keeping everything else the 


Testing stated hypotheses, but not designing controlled experiments 


You said you wanted to test 
if changing the [IV] affects 
the [DV] in your hypothesis. 


To do this, run pairs of trials 
and change only the [IV]. 
Keep all the other varia- 
bles the same for the second 
trial. 


same lets you tell for sure if the 
[IV] affects the [DV]. 


Level 3: 
Let me help you some more. 


Just change the [IV] and run 
another trial. Don't change 
the other variables. 


Doing this lets you tell for 
sure if changing the [IV] 
affects the [DV]. 


